1 R Setup and Required Packages

In this project, the open-source R programming language is used to model/monitor changes in the gait cycles over time. R is maintained by an international team of developers who make the language available at The Comprehensive R Archive Network. Readers interested in reusing our code and reproducing our results should have R installed locally on their machines. R can be installed on a number of different operating systems (see Windows, Mac, and Linux for the installation instructions for these systems). We also recommend using the RStudio interface for R. The reader can download RStudio for free by following the instructions at the link. For non-R users, we recommend the Hands-on Programming with R for a brief overview of the software’s functionality. Hereafter, we assume that the reader has an introductory understanding of the R programming language.

In the code chunk below, we load the packages used to support our analysis.

if(require(checkpoint)==FALSE) install.packages("checkpoint") # check to see if checkpoint is installed; if not, install it
library(checkpoint) # package used to facilitate the reproducibility of our work

# a checkpoint of R packages on CRAN on June 08, 2020 to enable the reproduction of our work in the future
checkpoint("2020-06-08")

# check if packages are not installed; if yes, install missing packages
packages = c("tidyverse", "magrittr", # typical data analysis packages
             "foreach", "parallel", # packages used for parallelizing some code chunks
             "MALDIquant", # match closest points between two vectors
             "MASS", # calculates the generalized inverse matrix
             "R.matlab")
newPackages = packages[!(packages %in% installed.packages()[,"Package"])]
if(length(newPackages) > 0) install.packages(newPackages)

# using the library command to load all packages; invisible used to avoid printing all packages and dependencies used
invisible(lapply(packages, library, character.only = TRUE))

source("./Functions.R") # our custom built functions

set.seed(2020)
startTime <- Sys.time()

2 Extracting, Transforming and Loading (ETL) the Gait Data into R

In this paper, we perform a secondary data analysis of the experimental data reported in Baghdadi et al. (2019), where fifteen participants were equipped with an IMU sensor wrapped on their right ankle. We used the IMU data to transform the linear acceleration signals from the local frame of reference to the global frame of reference. Next, we calculated the acceleration magnitude signal and smoothed it by applying a low pass butterworth filter with a cut-off frequency of 10 Hz. Further, we segmented the idividual gait cycles from foot-flat to foot-flat for each participant by tuning the parameters of the segmentation algorithm for each participant.

We load four different sets of data into R:

  • Subject heights, which is read directly from the GitHub Repository of the Baghdadi et al. (2019) paper. Note that we have excluded Subject 13’s data due to ———–we need to insert some reasons here—————.
  • Features, where we extract the x and y positions as well as stride duration and experimental time. The first three elements are used to compute stride length, height and duration which are three important kinematic features that can be used to model/monitor changes in gait. Note that we did not use this data directly from the Baghdadi et al. (2019) paper since our segmentation here was from foot-flat to foot-flat and a different frequency for the butterworth filter was used (given that the goals of our paper and the original paper are different).
  • Profiles, where the points forming each gait segment are mantained. The profiles present an alternative approach to monitoring and modeling changes in the gait cycles.
  • Video tags, where have coded when subjects enter and leave the main laboratory room. This information is used to ensure the consistency of the task being performed by the participant, i.e., we do not want the detected changes to correspond to a task change, but rather to indicate fatigue (see the discussion in our manuscript for more details).

The code chunk below provides our code for reading the three different datasets. The subject heights dataset is saved into a vector, which we use to scale the computed gait/stride length and height. The main outputs from this chunk are two lists (a) subjectFeaturesList which contains the scaled stride length, scaled stride height, stride duration and experimental time, and (b) subjectProfilesList which contains the sample IMU acceleration points making each acceleration profile.

# Reading the subject Heights from the Baghdadi et al. (2019) GitHub Repo:
# ------------------------------------------------------------------------
subject_heights <- read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/DemographicData.csv") %>%   pull(Height) %>% `*` (0.01) %>%  .[-13] # pull height and convert into meters

# Computing gait features based on experimental data from Baghdadi et al. (2019):
# -------------------------------------------------------------------------------
featureFiles <- list.files(path = "../Data/matFiles/Features/", pattern = "*.mat", full.names = TRUE)
rawSubjectsData <- lapply(featureFiles, readMat) %>% # read all mat files in Feature List
  lapply("[[", "M.i.k") # extract sublist titled "M.i.k" from all the lists
names(rawSubjectsData) <- str_extract_all(featureFiles, "Subject[:digit:]*", simplify = TRUE)

numRows <- lengths(rawSubjectsData)/17 # number of observations per Subject

subjectFeaturesList <- vector(mode = "list", length = length(rawSubjectsData)) # initialize list
names(subjectFeaturesList) <- names(rawSubjectsData)

for (i in seq(length(rawSubjectsData))) {
  gaitLength <-  rawSubjectsData[[i]][ 1:numRows[i] ] %>% 
    lapply(range) %>% lapply(diff) %>% unlist() %>% 
    `/` (subject_heights[i])
  
  gaitHeight <- rawSubjectsData[[i]][ (1+numRows[i]):(2*numRows[i]) ] %>% 
    lapply(range) %>% lapply(diff) %>% unlist() %>% 
    `/` (subject_heights[i])
  
  gaitDuration <- rawSubjectsData[[i]][ (1+ 2*numRows[i]):(3*numRows[i]) ] %>% 
    lapply(range) %>% lapply(diff) %>% unlist(recursive = TRUE)
  
  expTime <- rawSubjectsData[[i]][ (1+ 16*numRows[i]):(17*numRows[i]) ] %>% 
    lapply(unlist) %>% lapply(min) %>% unlist()
  
  subjectFeaturesList[[i]] <- cbind(gaitLength, gaitHeight, gaitDuration, expTime)
}


# Reading the segmented gait cycles for each participant:
# -------------------------------------------------------
profileFiles <- list.files(path = "../Data/matFiles/Profiles/", pattern = "*.mat", full.names = TRUE) 
subjectProfilesList <- lapply(profileFiles, readMat) %>% # read all mat files from the path above
  lapply("[[", "gait") # extract sublist titled "gait" from all the lists
names(subjectProfilesList) <- sapply(profileFiles, str_extract_all, "Subject[:digit:]*", simplify = TRUE)

# Reading task-related based on the tagging of the experimental videos:
# ---------------------------------------------------------------------
videoFiles <- list.files(path = "../Data/csvFiles/", pattern = "*.csv", full.names = TRUE) 
videoDataList <- lapply(videoFiles, read.csv)
names(videoDataList) <- sapply(videoFiles, str_extract_all, "Subject[:digit:]*", simplify = TRUE)

3 Data Preprocessing and Rational Subgroups

3.1 Data Preprocessing

In Section 2, we have read four different data sets and extracted the three gait features. In this subsection, we perform further preprocessing of the data. Specifically, we remove the first 10 minutes of experimental data from the videoDataList since these corresponded to the warm-up period of the experiment as detailed in Maman et al. (2017), Baghdadi et al. (2019), and Maman et al. (2020). WE NEED TO CONTINUE THE DISUCSSION OF WHAT WE HAVE DONE IN THIS SECTION. I STOPPED WORKING ON THIS UNTIL WE DISCUSS. Saeb, this is currently slightly different from your implementation as I am removing any task that spanned the warm-up period; I think this is a cleaner way of doing it and it will not change your preliminary results much

####################################################################
# When this runs the previous chink shou run. "dependson" does not solve this. I still need to find the way!
####################################################################

for (i in seq(length(rawSubjectsData))) {
  videoDataList[[i]] %<>%
    dplyr::select(Leaves, Enters) %>%
    apply(MARGIN=2, FUN=na.omit) %>% # removing NA observations
    as.data.frame %>%
    mutate(Enters = Enters - 600, Leaves = Leaves - 600) %>% # subtracting 600 seconds for warm-up
    filter(Enters > 0 & Leaves > 0) # removing observations/tasks, which contained any of the warm-up period
}

3.2 Rational Subgroups

Given the nature of the experimental study, the subjects performed varying tasks that may affect our gait analysis; we have only examined data where the subjects are outside of the main laboratory room since this captures the material handling component of the examined task. An immediate consequence of using only “out-of-the-room” data is that the time differences between the retained segmented gait cycles is no longer almost constant since relatively large time gaps exist, which correspond to when the participant is in the room. For this reason, we have subgrouped the data. Consecutive “out-of-the-room” gait cycles are placed in one subgroup. Statistically speaking, our subgrouping strategy follows the rational subgrouping mantra in statistical surveillance. The purpose of rational subgrouping is two-fold:

  • Minimize the variability within a subgroup such that the witin-group variability captures the natural variability in the gait cycles.
  • Between group variability can be used to capture the effects of “assignable causes of variability”, which can include disease progression, fatigue, etc.

For more details on rational subgrouping, we refer the reader to Montgomery (2019) for more details. The code chunk below provides our approach to creating the rational subgroups for both the “profile” and “gait features” data sets.

numRows <- lengths(subjectProfilesList)/5
confidence <- videoDataList %>% lapply(function(x) {mean(x[,2] - x[,1])}) %>%
  do.call(what=c) %>% `*` (0.05) # 5% confidence for each subgroup

subgroupList <- list()
for (id in seq(length(rawSubjectsData))) {

  aM <- subjectProfilesList[[id]] %>% do.call(what=c) %>%
    .[(numRows[[id]]+1):(2*numRows[id])] %>%
    lapply(as.vector)

  expT <- subjectProfilesList[[id]] %>% do.call(what=c) %>%
    .[(4*numRows[[id]]+1):(5*numRows[id])] %>%
    lapply(function(x) {x[1,1]}) %>% do.call(what=c)

  ###########################################################
  # Match times of walking cycles, extracted from videos, against the experimental times
  leaveIdx <- match.closest(videoDataList[[id]]$Leaves + confidence[id], expT)
  enterIdx <- match.closest(videoDataList[[id]]$Enters - confidence[id], expT)

  rmIdx <- c(3000, which(enterIdx - leaveIdx < 5))
  leaveIdx <- leaveIdx[-rmIdx]
  enterIdx <- enterIdx[-rmIdx]

  ########################################################## Create subgroups
  aMList <- list()
  LHDList <- list()
  for (i in 1:length(leaveIdx)) {
    aMList[[i]] <- aM[leaveIdx[i]:enterIdx[i]]
    LHDList[[i]] <- subjectFeaturesList[[id]][leaveIdx[i]:enterIdx[i],]
  }

  ######################################### Remove subgroups with length < 20
  LHDList %<>% .[-{which(lengths(aMList) < 20) %>% c(3000)}]
  aMList %<>% .[-{which(lengths(aMList) < 20) %>% c(3000)}]

  ######################################### Remove observations with profiles< 2nd percentile profile length; This is done to cut all profiles at this length (equal number of features)
  cutLen <- aMList %>% do.call(what=c) %>% lengths() %>% quantile(probs=0.02)
  for (i in 1:length(aMList)) {
    rmIdx <- which(lengths(aMList[[i]]) < cutLen) %>% c(3000)
    aMList[[i]] <- aMList[[i]][-rmIdx]
    LHDList[[i]] <- LHDList[[i]][-rmIdx,]
  }

  ######################################## Create a matrix with cutLen number of columns in each profile subgroup
  aMList %<>% lapply(function(x) {do.call(rbind, x)}) %>%
    lapply(function(x) {x[,1:cutLen]})

  ############################################ Lengths of subgroups 0&1: 10&20
  subgroupList[[id]] <- list(Acc = list(X0=aMList[1:10], X1=aMList[11:30], X2=aMList[31:length(aMList)]),
                             LHD = list(X0=LHDList[1:10], X1=LHDList[11:30], X2=LHDList[31:length(LHDList)]))
}
names(subgroupList) <- c(paste0("subject", c(1:12, 14:15)))

4 \(T^2\) Calculations

4.1 \(T^2\) Calculations on Gait Acceleration Profiles

Initialize.

T2Profiles <- list()
for (id in seq(length(subgroupList))) {
  X0 <- subgroupList[[id]]$Acc$X0

  ############################# Phase0
  ############################# Remove single outliers in number of passes
  pass <- 1
  for (i in 1:pass) {
    UCL0 <- BootstrapUCL0(x0=X0, alpha=0.05, iterations=200) # Find in Functions
    ######################### Find T2 for original X0 and remove outliers from subgroups
    X0Pooled <- do.call(rbind, X0)
    T2 <- X0Pooled %>% apply(FUN=mean, MARGIN=2) %>%
      rep.row(n=nrow(X0Pooled)) %>% -(X0Pooled) %>%
      apply(MARGIN=1, function(x) {x %*% ginv(cov(X0Pooled)) %*% x})

    tmp <- mapply(function(x,y) {rep(y, x)}, x=sapply(X0, nrow), y=1:10) %>% do.call(what=c)
    OCIdx <- T2 %>% split(f=tmp) %>% lapply(function(x) {c(which(x>UCL0), 3000)})
    X0 %<>% mapply(FUN=function(x, y) {x[-y,]}, y=OCIdx) # Updated X0 after phase0
  }

  X0Pooled <- do.call(rbind, X0)
  X0PooledMean <- colMeans(X0Pooled)

  ############################# Within batch covariance; computed from phase0 after outlier removal
  S_w <- X0 %>% lapply(colMeans) %>%
    mapply(FUN=rep.row, sapply(X0, nrow)) %>% # Mean of each subgroup
    mapply(FUN=function(x,y) {x-y}, y=X0, SIMPLIFY=F) %>%
    lapply(function(x) {t(x) %*% x}) %>%
    Reduce(f=`+`) %>%
    `/` (sum(lengths(X0)/ncol(X0[[1]])) - length(X0))

  ############################# Phase1&2
  ############################# Read phase 1&2 subgroups
  X1 <- subgroupList[[id]]$Acc$X1
  X2 <- subgroupList[[id]]$Acc$X2

  ######################### Compute T2 in phase1
  X1Mean <- sapply(X1, colMeans) %>% t()
  T2 <- rep.row(X0PooledMean, length(X1)) %>% -(X1Mean) %>%
    apply(MARGIN=1, function(x) x %*% ginv(S_w) %*% x)

  ######################### Bootstrap UCL using phase 1 data
  sampled <- sample(T2, 1000*length(T2), replace=T) %>% matrix(ncol=length(T2), byrow=T)
  UCL <- apply(sampled, 1, quantile, probs=0.99) %>% mean()

  ######################### T2 for phase2 data
  X12Mean <- sapply(c(X1, X2), colMeans) %>% t()
  T2 <- rep.row(X0PooledMean, length(c(X1, X2))) %>% -(X12Mean) %>%
    apply(MARGIN=1, function(x) x %*% ginv(S_w) %*% x)
  T2Profiles[[id]] <- list(T2=T2, UCL=UCL)
}
names(T2Profiles) <- paste0("subject", c(1:12, 14:15))

4.2 Plots of \(T^2\) on Gait Acceleration Profiles along with Reported RPE

Initialize.

RPE <- read.csv("https://raw.githubusercontent.com/fmegahed/gait_hotelling/master/Data/csvFiles/RPE/RPE.csv") %>% .[,-14]

SubgroupTimeList <- subgroupList %>% lapply(function(x) {x$LHD[c(2,3)]}) %>%
  lapply(function(x) {do.call(c, x)}) %>%
  lapply(function(x) {sapply(x, function(a) {median(a[,4])})})

for (id in seq(length(subgroupList))) {
  # for (id in 1:2) {
  T2 <- T2Profiles[[id]]$T2
  T2Time <- SubgroupTimeList[[id]]
  UCL <- T2Profiles[[id]]$UCL
  color <- ifelse(T2>UCL, "OC", "IC")
  T2.df <- data.frame(T2Time=T2Time, T2=T2, col=color,
                      shape=c(rep("Phase 1", 20), rep("Phase 2", length(T2) - 20)))
  
  RPE.df <- data.frame(RPETime=RPE[,1], RPE=RPE[,id+1])
  
  ############# Needed to plot the secondary axis
  ylim.prim <- range(pretty(T2))
  ylim.sec <- c(6,20)
  
  b <- diff(ylim.prim)/diff(ylim.sec)
  a <- b*(ylim.prim[1] - ylim.sec[1])
  
  ############# 
  
  T2Plot <- ggplot(data=T2.df) +
    geom_point(aes(x=T2Time, y=T2, shape=shape, col=col), size=2.5) +
    geom_line(aes(x=T2Time, y=T2)) +
    xlab("Experiment Time (s)") + ylab(expression(paste("T"^2, " Profiles"))) +
    
    geom_line(data=RPE.df, mapping=aes(x=RPETime, y=a+RPE*b), color="blue", size=1) +
    # annotate("text", x=RPE.df$RPETime[5], y=(RPE.df$RPE[5]*b*1.1)+a, label="RPE", 
    #        size=5, color="blue") +
    scale_y_continuous(sec.axis=sec_axis(trans=~ (. - a)/b,
                                       name="RPE Borg Scale")) +

    geom_hline(yintercept=UCL) +
    annotate("text", 2, UCL, vjust=-1, label="UCL", size=5) +

    scale_shape_manual(values = c(6, 16)) +
    scale_color_manual(values = c("black", "red")) +
    ggtitle(paste0("Subject ", id)) +

    theme_bw() +
    theme(plot.title=element_text(size = 20, hjust=0.5, margin = margin(t=10, b=-30)),
          axis.text=element_text(size=15),
          axis.title=element_text(size=20, color="black"),
          legend.title=element_blank(), # This removes the legend title
          legend.position=c(.1, .68),
          legend.text=element_text(size=10),
          legend.key = element_rect(fill="grey"),
          axis.title.y.right=element_text(color="blue"),
          axis.text.y.right=element_text(color="blue"))

  cat('###',paste0("Subject", id), "{-}",'\n')
  print(T2Plot)
  cat('\n \n')
}

Subject1

Subject2

Subject3

Subject4

Subject5

Subject6

Subject7

Subject8

Subject9

Subject10

Subject11

Subject12

Subject13

Subject14

4.3 \(T^2\) Calculations on Gait Features

Initialize.

T2Features <- list()
SWList <- list()
X0Updated <- list()
for (id in seq(length(subgroupList))) {
  X0 <- subgroupList[[id]]$LHD$X0 %>% lapply(function(x) {x=x[,1:3]}) # Last column contains experiment time

  ############################# Phase0
  ############################# Remove single outliers in number of passes
  pass <- 1
  for (i in 1:pass) {
    UCL0 <- BootstrapUCL0(x0=X0, alpha=0.05, iterations=200) # Find in Functions
    ######################### Find T2 for original X0 and remove outliers from subgroups
    X0Pooled <- do.call(rbind, X0)
    T2 <- rep.row(colMeans(X0Pooled), n=nrow(X0Pooled)) %>% -(X0Pooled) %>%
      apply(MARGIN=1, function(x) {x %*% ginv(cov(X0Pooled)) %*% x})

    tmp <- mapply(function(x,y) {rep(y, x)}, x=sapply(X0, nrow), y=1:10) %>% do.call(what=c)
    OCIdx <- T2 %>% split(f=tmp) %>% lapply(function(x) {c(which(x>UCL0), 3000)})
    X0 %<>% mapply(FUN=function(x, y) {x[-y,]}, y=OCIdx) # Updated X0 after phase0
  }
  X0Pooled <- do.call(rbind, X0)
  X0PooledMean <- colMeans(X0Pooled)

  ############################# Within batch covariance; computed from phase0 after outlier removal
  S_w <- X0 %>% lapply(colMeans) %>%
    mapply(FUN=rep.row, sapply(X0, nrow)) %>% # Mean of each subgroup
    mapply(FUN=function(x,y) {x-y}, y=X0, SIMPLIFY=F) %>%
    lapply(function(x) {t(x) %*% x}) %>%
    Reduce(f=`+`) %>%
    `/` (sum(lengths(X0)/ncol(X0[[1]])) - length(X0))

  ############################# Phase1&2
  ############################# Read phase 1&2 subgroups
  X1 <- subgroupList[[id]]$LHD$X1 %>% lapply(function(x) {x=x[,1:3]})
  X2 <- subgroupList[[id]]$LHD$X2 %>% lapply(function(x) {x=x[,1:3]})

  ######################### Compute T2 in phase1
  X1Mean <- sapply(X1, colMeans) %>% t()
  T2 <- rep.row(X0PooledMean, length(X1)) %>% -(X1Mean) %>%
    apply(MARGIN=1, function(x) x %*% ginv(S_w) %*% x)

  ######################### Bootstrap UCL using phase 1 data
  sampled <- sample(T2, 1000*length(T2), replace=T) %>% matrix(ncol=length(T2), byrow=T)
  UCL <- apply(sampled, 1, quantile, probs=0.99) %>% mean()

  ######################### T2 for phase2 data
  X12Mean <- sapply(c(X1, X2), colMeans) %>% t()
  T2 <- rep.row(X0PooledMean, length(c(X1, X2))) %>% -(X12Mean) %>%
    apply(MARGIN=1, function(x) x %*% ginv(S_w) %*% x)

  T2Features[[id]] <- list(T2=T2, UCL=UCL)
  X0Updated[[id]] <- list(X0=X0)
  SWList[[id]] <- list(S_w=S_w)
}
names(T2Features) <- paste0("subject", c(1:12, 14:15))
names(X0Updated) <- paste0("subject", c(1:12, 14:15))
names(SWList) <- paste0("subject", c(1:12, 14:15))

4.4 Plots of \(T^2\) on Gait Features along with Reported RPE

Initialize.

RPE <- read.csv("https://raw.githubusercontent.com/fmegahed/gait_hotelling/master/Data/csvFiles/RPE/RPE.csv") %>% .[,-14]

SubgroupTimeList <- subgroupList %>% lapply(function(x) {x$LHD[c(2,3)]}) %>%
  lapply(function(x) {do.call(c, x)}) %>%
  lapply(function(x) {sapply(x, function(a) {median(a[,4])})})

for (id in seq(length(subgroupList))) {
  # for (id in 1:2) {
  T2 <- T2Features[[id]]$T2
  T2Time <- SubgroupTimeList[[id]]
  UCL <- T2Features[[id]]$UCL
  color <- ifelse(T2>UCL, "OC", "IC")
  T2.df <- data.frame(T2Time=T2Time, T2=T2, col=color,
                      shape=c(rep("Phase 1", 20), rep("Phase 2", length(T2) - 20)))
  
  RPE.df <- data.frame(RPETime=RPE[,1], RPE=RPE[,id+1])
  
  ############# Needed to plot the secondary axis
  ylim.prim <- range(pretty(T2))
  ylim.sec <- c(6,20)
  
  b <- diff(ylim.prim)/diff(ylim.sec)
  a <- b*(ylim.prim[1] - ylim.sec[1])
  
  ############# 
  
  T2Plot <- ggplot(data=T2.df) +
    geom_point(aes(x=T2Time, y=T2, shape=shape, col=col), size=2.5) +
    geom_line(aes(x=T2Time, y=T2)) +
    xlab("Experiment Time (s)") + ylab(expression(paste("T"^2, " Features"))) +
    
    geom_line(data=RPE.df, mapping=aes(x=RPETime, y=a+RPE*b), color="blue", size=1) +
    # annotate("text", x=RPE.df$RPETime[5], y=(RPE.df$RPE[5]*b*1.1)+a, label="RPE", 
    #        size=5, color="blue") +
    scale_y_continuous(sec.axis=sec_axis(trans=~ (. - a)/b,
                                       name="RPE Borg Scale")) +

    geom_hline(yintercept=UCL) +
    annotate("text", 2, UCL, vjust=-1, label="UCL", size=5) +

    scale_shape_manual(values = c(6, 16)) +
    scale_color_manual(values = c("black", "red")) +
    ggtitle(paste0("Subject ", id)) +

    theme_bw() +
    theme(plot.title=element_text(size = 20, hjust=0.5, margin = margin(t=10, b=-30)),
          axis.text=element_text(size=15),
          axis.title=element_text(size=20, color="black"),
          legend.title=element_blank(), # This removes the legend title
          legend.position=c(.1, .68),
          legend.text=element_text(size=10),
          legend.key = element_rect(fill="grey"),
          axis.title.y.right=element_text(color="blue"),
          axis.text.y.right=element_text(color="blue"))

  cat('###',paste0("Subject", id), "{-}",'\n')
  print(T2Plot)
  cat('\n \n')
}

Subject1

Subject2

Subject3

Subject4

Subject5

Subject6

Subject7

Subject8

Subject9

Subject10

Subject11

Subject12

Subject13

Subject14

5 Diagnosis of Gait Features

5.1 Decomposition of Features \(T^2\)

Initialize.

qList <- list()
for (id in seq(length(subgroupList))) {
  ############### Read inputs
  X0 <- X0Updated[[id]]$X0
  X1 <- subgroupList[[id]]$LHD$X1 %>% lapply(function(x) {x=x[,1:3]})
  X2 <- subgroupList[[id]]$LHD$X2 %>% lapply(function(x) {x=x[,1:3]})
  X0PooledMean <- do.call(rbind, X0) %>% colMeans()
  X1Mean <- sapply(X1, colMeans) %>% t()
  X12Mean <- sapply(c(X1, X2), colMeans) %>% t()
  S_w <- SWList[[id]]$S_w

  T2 <- T2Features[[id]]$T2
  UCL <- T2Features[[id]]$UCL
  t3 <- which(T2 > UCL)
  p <- 3 # number of features
  alpha <- 0.01

  ############## T2 Decomposition and UCL's
  q <- init.decomp(p=p) # initialize the decomposition matrix
  for(i in 1:nrow(q)) {# Control limits for each component of the decomposition
    b <- subset(q[i,3:ncol(q)], q[i,3:ncol(q)] > 0)

    T2 <- rep.row(X0PooledMean[b], length(X1)) %>% -(X1Mean[,b]) %>%
      apply(MARGIN=1, function(x) x %*% solve(S_w[b,b]) %*% x)

    sampled <- matrix(sample(T2, 1000 * length(T2), replace=T), ncol=20, byrow=T)
    q[i,2] <- apply(sampled, 1, quantile, probs=1-alpha) %>% mean()
  }

  ############## Decompose T2 for every out of control data point
  tmpList <- list()
  for(ii in 1:length(t3)){
    for(i in 1:nrow(q)) {
      b <- subset(q[i,3:ncol(q)], q[i,3:ncol(q)] > 0)
      q[i,1] <- t(X0PooledMean[b]-X12Mean[t3[ii],][b]) %*%
        solve(S_w[b,b]) %*%
        (X0PooledMean[b]-X12Mean[t3[ii],][b])
    }
    ############ Store decomposed elements
    tmpList[[ii]] <- q
  }
  names(tmpList) <- paste0("subgroup ", t3)
  qList[[id]] <- tmpList
}
names(qList) <- paste0("subject", c(1:12, 14:15))

5.2 2D Ellipse plots

Initialize.

################ Select subject number and features
id <- 1 # Select the subject number
f <- c(2,3) # Select 2 features by interpreting the T2 decomposition

################ Read inputs
X0 <- X0Updated[[id]]$X0
X1 <- subgroupList[[id]]$LHD$X1 %>% lapply(function(x) {x=x[,1:3]})
X2 <- subgroupList[[id]]$LHD$X2 %>% lapply(function(x) {x=x[,1:3]})
X0Pooled <- do.call(rbind, X0)
X0PooledMean <- do.call(rbind, X0) %>% colMeans()
X1Pooled <- do.call(rbind, X1)
S_w <- SWList[[id]]$S_w
q <- qList[[id]][[1]] # Read one of the q's as its 2nd column (q[,2]) are the control limits and thus the same across all subgroups in one subject
T2 <- T2Features[[id]]$T2
UCL <- T2Features[[id]]$UCL

################ Univariate CL's and 2D plot limits
XLim <- c(X0PooledMean[f[1]]-sqrt(S_w[f[1],f[1]]*q[f[1],2]),
          X0PooledMean[f[1]]+sqrt(S_w[f[1],f[1]]*q[f[1],2]))

YLim <- c(X0PooledMean[f[2]]-sqrt(S_w[f[2],f[2]]*q[f[2],2]),
          X0PooledMean[f[2]]+sqrt(S_w[f[2],f[2]]*q[f[2],2]))

xlim <- c(min(XLim) - 1.2*diff(XLim), max(XLim) + 1.2*diff(XLim))
ylim <- c(min(YLim) - 1.2*diff(YLim), max(YLim) + 1.2*diff(YLim))

################ Create grid for the ellipse contour plot
ngrid <- 100
grid <- cbind(seq(xlim[1], xlim[2], length = ngrid),
              seq(ylim[1], ylim[2], length = ngrid))

################  T2 on the grid points
T2_grid <- apply(expand.grid(grid[,1], grid[,2]), 1,
                 function(x) {(x- X0PooledMean[f]) %*% solve(S_w[f,f]) %*% (x- X0PooledMean[f])})
T2_grid <- matrix(T2_grid, ngrid, ngrid)

######################################################## Plot
############## Plot the contour
# old.par <- par()
par(mar=c(5, 5, 4, 2) + 0.1)

plot(1 ,1, type = "o", xlim = xlim, ylim = ylim, main="Ellipse Chart", cex.main=1.5,
     xlab=paste0(colnames(X0[[1]])[f[1]], " (m)"),
     ylab=paste0(ylab=colnames(X0[[1]])[f[2]], " (s)"),
     cex.axis=1.5, cex.lab=1.5)

legend("topleft", legend=c("phase 0 subgroup means", "phase 1 subgroup means",
                           "X control limits", "Y control limits"),
       col=c("green", "blue", "black", "brown"), pch=c(3,3,NA,NA), lty=c(NA,NA,2,2),
       cex=1, pt.cex=2, lwd=2, bg="grey")
################
idx <- apply(q[,3:5], MARGIN=1, function(r) all(c(f,0) %in% r)) %>% which()
contour(grid[,1], grid[,2], T2_grid, levels=q[idx,2], drawlabels=T, add=T)

################ Plot phase 0 data
center0 <- X0 %>% lapply(function(x) {x[,f]}) %>% sapply(colMeans) %>% t()
points(center0[,1], center0[,2], pch=3, cex=2, col="green", lwd=2)
# points(X0Pooled[,f], pch=1, cex=0.5, col="blue")

################ Plot phase 1 data
center1 <- X1 %>% lapply(function(x) {x[,f]}) %>% sapply(colMeans) %>% t()
points(center1[,1], center1[,2], pch=3, cex=2, col="blue", lwd=2)
# points(X1Pooled[,f], pch=1, cex=0.5, col="blue")

################ plot phase 1&2 data by subgroup number
center12 <- c(X1,X2) %>% lapply(colMeans) %>% sapply(function(x) {x[f]}) %>% t()
labels <- T2 %>% t() %>% as.data.frame() %>% names() %>% gsub(pattern='V', replacement='')
text(center12, labels=labels, cex = 1, col=ifelse(T2>UCL, "red", "black"))

################ Plot univariate CL's

abline(v=XLim, lty="dashed", col="black", lwd=2)
abline(h=YLim, lty="dashed", col="brown", lwd=2)


References

Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & Megahed, F. M. (2019). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. Journal of Quality Technology, Online First. https://doi.org/10.1080/00224065.2019.1640097

Maman, Z. S., Chen, Y.-J., Baghdadi, A., Lombardo, S., Cavuoto, L. A., & Megahed, F. M. (2020). A data analytic framework for physical fatigue management using wearable sensors. Expert Systems with Applications, 113405.

Maman, Z. S., Yazdi, M. A. A., Cavuoto, L. A., & Megahed, F. M. (2017). A data-driven approach to modeling physical fatigue in the workplace using wearable sensors. Applied Ergonomics, 65, 515–529.

Montgomery, D. C. (2019). Introduction to statistical quality control. Wiley.


  1. Email: | Website: LinkedIn↩︎

  2. Email: | Phone: +1-716-645-6063 | Website: University at Buffalo Official↩︎

  3. Email: | Phone: +1-716-645-4696 | Website: University at Buffalo Official↩︎

  4. Email: | Phone: +1-513-529-4185 | Website: Miami University Official↩︎

  5. Email: | Phone: +1-513-529-4823 | Website: Miami University Official↩︎